ref_time_ticks: normalize to nanoseconds on every platform#2685
Merged
Conversation
Until this commit, ref_time_ticks() returned raw QueryPerformanceCounter
ticks on Windows (~10 MHz) and clock_gettime nanoseconds on Linux/macOS.
Any caller that did raw arithmetic on the result -- the natural
let deadline = ref_time_ticks() + int64(timeout_sec * 1_000_000)
while (ref_time_ticks() < deadline) { ... }
deadline pattern -- silently got 30 s on Windows (lucky math at 10 MHz)
and 30 ms on POSIX (1000x too short). Recently surfaced as a CI hang in
dasImgui's playwright harness, which read deadlines that elapsed instantly
on Linux/macOS runners.
Normalize ref_time_ticks() to nanoseconds on Windows by dividing the QPC
counter by the cached QueryPerformanceFrequency. The conversion uses a
split whole+remainder fold so the intermediate never overflows int64.
QPF is cached once per process (invariant after boot, race-tolerant).
Helpers (get_time_usec, get_time_nsec, ref_time_delta_to_usec) become
trivial subtraction/division now that the unit is uniform; the Windows
ref_time_delta_to_usec also loses a long-standing QPC-for-QPF typo that
made it return garbage in the old implementation. The POSIX helpers are
unchanged in semantics, just tidied for symmetry.
All in-tree callers (C++ and .das, including dastest, daslib/profiler,
strudel, examples, MCP tools) already use the safe `let t0 = ref_time_ticks()`
+ `get_time_usec(t0)` pattern, so no caller changes are needed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Normalizes ref_time_ticks() to return CLOCK_MONOTONIC-style nanoseconds on Windows, matching the existing Linux/macOS behavior. Previously the Windows implementation returned raw QueryPerformanceCounter ticks (~10 MHz typical), causing silently wrong arithmetic in deadline patterns shared between POSIX and Windows. Also fixes a long-standing typo (QueryPerformanceCounter(&freq) instead of QueryPerformanceFrequency) in the old ref_time_delta_to_usec.
Changes:
- Windows
ref_time_ticks()now converts QPC ticks to ns using a splitwhole + remainderfold to avoid int64 overflow, with a cachedQueryPerformanceFrequencyvalue via a newqpc_freq()helper. get_time_usec,get_time_nsec, andref_time_delta_to_useccollapse to trivial subtraction/division on all platforms now that units are uniform; the buggy QPC-instead-of-QPF call is removed.- Adds a top-of-file doc comment explaining the unification and steering callers toward
get_time_usec/get_time_nsecwrappers.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The previous commit's split conversion (whole+rem with two idiv per call) doubled the cost of ref_time_ticks() from ~22 ns to ~40 ns on modern Windows. That matters for the function profiler, which brackets every profiled call: ~36 ns of profiler skew per function vs ~14 ns before. Precompute `qpc_ns_per_tick = 1e9 / freq` once when QPF divides 1e9 cleanly. On every Win 7+ box that's the case (QPF is fixed at 10 MHz, ns_per_tick = 100), so the hot path collapses to one multiply and the call returns to ~23 ns -- within 1 ns of the bare QueryPerformanceCounter cost. The split fallback stays for paranoid completeness on non-divisible QPF (theoretical; not observed on shipping Windows hardware in years). Microbench on this box (QPF=10MHz, MSVC /O2, 50M iterations): ref_old (raw QPC, returns ticks) 22.3 ns/call 1.00x ref_new_split (whole+rem -> ns, previous) 39.8 ns/call 1.79x ref_new_fast (ticks * ns_per_tick) 23.0 ns/call 1.03x Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the handmade RST blurb for builtin::ref_time_ticks to reflect the new contract: monotonic timestamp in nanoseconds, raw subtraction valid since the unit is uniform across Windows/Linux/macOS. The previous wording described the return value as opaque "ticks", which was platform-dependent and led to caller-side deadline-math bugs (the dasImgui CI hang). Sphinx clean build succeeded, zero new warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 16, 2026
Copilot AI
added a commit
that referenced
this pull request
May 16, 2026
… post-PR #2685 ns normalization Agent-Logs-Url: https://github.com/GaijinEntertainment/daScript/sessions/97224dec-45d1-4968-a3dd-8e5f37274983 Co-authored-by: borisbat <272689+borisbat@users.noreply.github.com>
pull Bot
pushed a commit
to forksnd/daScript
that referenced
this pull request
May 17, 2026
PR GaijinEntertainment#2685 normalized ref_time_ticks() to nanoseconds across every platform (Windows used to return raw QPC ticks at the underlying counter's frequency — typically 10 MHz). The fix shipped without a unit test that would have caught a units regression. Add four tests under tests/fio/perf_time.das (sleep() lives in fio, so this is the right neighborhood): - monotonic — 1000 successive reads never go backwards. Catches any signed/unsigned mixup or wrap-around bug in the ns conversion arithmetic. - sleep_roundtrip — sleep(100 ms) -> delta_ns must land in [80 ms, 500 ms]. The 80 ms lower bound is the load-bearing assertion: if Windows reverted to raw QPC ticks (10 MHz counter on the typical box -> a 100 ms wall-clock sleep would surface as 1000000 "ticks" interpreted as ns, i.e. 1 ms), the test would trip. Wide upper bound covers CI runner scheduler jitter. - get_time_usec_agrees — the get_time_usec(t0) helper agrees with (ref_time_ticks() - t0) / 1000 within 5 ms. Two helpers reading the same underlying clock should not drift; if one ever ends up on a different code path, this notices. - units_are_nanoseconds — three back-to-back sleep(100 ms) deltas stay within 200 ms spread. If the unit accidentally changed mid-run (think: thread-local frequency cache going stale), the deltas would diverge wildly. The test runs cleanly in both interpreter and AOT mode on Windows (Win11 local): sleep(100 ms) -> 102-109 ms delta, get_time_usec agrees to within microseconds. tests/aot/CMakeLists.txt:224 already covers tests/fio/*.das via FILE(GLOB CONFIGURE_DEPENDS); cmake reconfigure picks the new file up automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot
pushed a commit
to forksnd/daScript
that referenced
this pull request
May 17, 2026
…ssion Cards added in the course of the linq_fold splice rewrite + PR GaijinEntertainment#2691 (has_sideeffects + counter-lane elision). Topics: linq_fold / macro-emission patterns: - daslang-generic-instance-detect-via-fromgeneric — func.fromGeneric is the canonical "which generic was this instantiated from?" link; func.name on typed instances is mangled. - daslib-macro-boost-has-sideeffects-predicate — new public predicate, full classification table, known limitations, test plumbing. - qmacro-invoke-source-bind-typedecl-modifier-iter-vs-array — typedecl block-param const/ref handling differs between iterator and array sources; the two diagnostic error messages tell you which branch you picked wrong. - qmacro-gensym-per-callsite-via-lineinfo — backtick-prefixed names + line+column suffix, force_at / force_generated / can_shadow. - my-fold-macro-emits-a-loop-with-for-it-in-source-... (UPDATED) — peel_each pattern corrected for generic-instance detection + positive array gate + block-param typedecl handling. LINQ semantics: - are-there-parity-tests-in-tests-linq-that-compare-fold-output-to-... - which-typedecl-predicates-identify-types-where-length-expr-is-... - why-does-each-arr-fail-with-unsafe-when-not-source-of-for-loop-... - what-s-the-right-sqlite-linq-chain-form-for-aggregates-sum-min-max-... - my-macro-substitutes-it-for-a-projection-expression-via-template-... - when-a-call-macro-needs-to-pick-copy-vs-move-init-for-a-projection-... - where-does-nolint-rule-go-when-a-lint-warning-is-emitted-from-inside-... Tooling / ops: - how-do-i-run-dastest-in-benchmark-only-mode-and-what-s-the-command-... - cpp-profiler-macos-samply-instruments.md - what-s-the-end-to-end-checklist-for-adding-a-new-daslib-das-module-... - how-do-i-call-a-dasimgui-or-any-managed-c-method-on-a-struct-field-... Updated: - why-does-my-dastest-integration-test-hang-at-readiness-gate-failed-... — original card pointed at a require-order red herring; real cause was ref_time_ticks() returning ns on POSIX while wait_until_ready's deadline math assumed μs. Fix landed in PR GaijinEntertainment#2685. No code changes — docs only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ref_time_ticks()now returns CLOCK_MONOTONIC-style nanoseconds on every platform. Previously it returned rawQueryPerformanceCounterticks (~10 MHz typical) on Windows butclock_gettimenanoseconds on Linux/macOS. That unit mismatch silently broke the natural deadline pattern:— 30 s on Windows (lucky math at 10 MHz), 30 ms on POSIX (1000× too short). Surfaced as a CI hang in dasImgui's playwright harness; deadlines elapsed instantly on Linux/macOS runners while Windows happened to land in the right ballpark.
What changed
ref_time_ticks()divides QPC by the cachedQueryPerformanceFrequencyand returns nanoseconds. Conversion uses a splitwhole + remainder * 1e9 / freqfold so the intermediate never overflows int64.qpc_freq()caches the QPF static — QPF is invariant after boot, and the race is benign (parallel initialisers compute the same value).get_time_usec,get_time_nsec,ref_time_delta_to_useccollapse to trivial sub/div now that the unit is uniform.ref_time_delta_to_usec(it calledQueryPerformanceCounter(&freq)instead ofQueryPerformanceFrequency, so it was returning garbage).get_time_usec(start)/get_time_nsec(start).Callers
All in-tree callers — C++ (
ast_parse,ast_simulate,runtime_profile, builtins) and.das(dastest, daslib/profiler, strudel, examples, MCP tools) — already use the safelet t0 = ref_time_ticks()+get_time_usec(t0)pattern, so no caller-side changes are required. The fix is purely insrc/hal/performance_time.cpp.Test plan
sleep(1000ms)afterref_time_ticks()→get_time_usecreports ~1_000_000,get_time_nsec~1_000_000_000, raw delta in ns ballpark🤖 Generated with Claude Code